# load libraries
# reporting
library(knitr)
# visualization
library(ggplot2)
library(ggthemes)
library(patchwork)
# data wrangling
library(dplyr)
library(tidyr)
library(readr)
library(skimr)
library(janitor)
library(magrittr)
# turn off scientific notation
options(scipen=999)18 Making good figures
Learning Objectives
After completing this tutorial you should be able to
- describe detailed characteristics of a figure that is clear, accessible, transparent, and honest.
- describe characteristics of an effective figure legend.
- describe characteristics of an effective description of results.
- describe characteristics of an effective interpretation/discussion of central results.
Download the directory for this project here, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.
There should be a file named 18_visualizations.qmd in that project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code; at minimum you should have one comment per code set1 you may of course add as many comments as you need to be able to recall what you did. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.
1 You should do this whether you are adding code yourself or using code from our manual, even if it isn’t commented in the manual… especially when the code is already included for you, add comments to describe how the function works/what it does as we introduce it during the participatory coding session so you can refer back to it.
Before you get started, let’s make sure to read in all the packages we will need.
18.1 Best practices for effective figures
Effective figures are used to tell a story. You can think of your exploratory analysis as the first step in determining what story the data tells (a fact finding mission if you will). Once you have determined what your specific question hypothesis is, the next step is narrowing down what analysis you will perform. Then you will identify your central results and interpret them in the context of your question. Your next step is communicating your results. At this point you need to identify which figures serve to communicate individual, specific points in the overall narrative.
In our example, we have refined our question to “Have impacts due to natural disasters increased at a global scale?”. To answer this question, we are calculating the total annual loss of life and annual total economic damages incurred for each disaster type.
disaster <- read_delim("data/nat-disasters_emdat-query-2021-10-05.txt", delim = "\t", skip = 6) %>%
clean_names()
total_yr <- disaster %>%
filter(!is.na(total_damages_000_us)) %>%
group_by(year,disaster_type) %>%
summarize(total_damages_yr = sum(total_damages_000_us))
deaths <- disaster %>%
filter(!is.na(total_deaths)) %>%
group_by(year, disaster_type) %>%
summarize(total_deaths_yr = sum(total_deaths)) Then we created bubble plot showing the total loss of life and total damages incurred for each disaster type in each year.
18.2 Best practices for communicating results
Visualizations are a key component to effectively communicating your results. A good rule of thumb is that your written description of your results should allow the reader to understand what you are trying to communicate even without visualizations and that your title & legend should be descriptive enough that even without the written results (some would even say methods) the reader would understand the results as well.
18.3 Practice the best practices!
18.4 Spatial visualization
You have probably noticed that for some of the geographic comparisons were are making the most effective way to communicate those results would be to create a map with countries color coded according to the metric we are assessing.
Let’s learn a straightforward method for visualization of spatial data. Run each of the code chunks below and comment the code line by line to describe what each function/argument is doing.
Install these packages as needed:
install.packages("rnaturalearth")
install.packages("rnaturalearthdata")
install.packages("sf")Now we can load these libraries.
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)The first thing we need is is some maps! The package rnaturalearth provides a map of countries of the entire world. Let’s load a map as an sf class.
world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)[1] "sf" "data.frame"
We see that world is both a data.frame and a class sf(simple features) which is a class of R object designed specifically for plotting maps.
Let’s take a look at the column names
colnames(world) [1] "scalerank" "featurecla" "labelrank" "sovereignt" "sov_a3"
[6] "adm0_dif" "level" "type" "admin" "adm0_a3"
[11] "geou_dif" "geounit" "gu_a3" "su_dif" "subunit"
[16] "su_a3" "brk_diff" "name" "name_long" "brk_a3"
[21] "brk_name" "brk_group" "abbrev" "postal" "formal_en"
[26] "formal_fr" "note_adm0" "note_brk" "name_sort" "name_alt"
[31] "mapcolor7" "mapcolor8" "mapcolor9" "mapcolor13" "pop_est"
[36] "gdp_md_est" "pop_year" "lastcensus" "gdp_year" "economy"
[41] "income_grp" "wikipedia" "fips_10" "iso_a2" "iso_a3"
[46] "iso_n3" "un_a3" "wb_a2" "wb_a3" "woe_id"
[51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent"
[56] "region_un" "subregion" "region_wb" "name_len" "long_len"
[61] "abbrev_len" "tiny" "homepart" "geometry"
You can see that it contains quote a bit of information including some data about each country. The column geometry has the information about the country “shapes” (multipolygons) we want to plot.
world %>%
select(geometry) %>%
head()Simple feature collection with 6 features and 0 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -70.06611 ymin: -18.01973 xmax: 74.89131 ymax: 60.40581
Geodetic CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
geometry
1 MULTIPOLYGON (((-69.89912 1...
2 MULTIPOLYGON (((74.89131 37...
3 MULTIPOLYGON (((14.19082 -5...
4 MULTIPOLYGON (((-63.00122 1...
5 MULTIPOLYGON (((20.06396 42...
6 MULTIPOLYGON (((20.61133 60...
We can plot the map using ggplot::geom_sf() and the same syntax we generally use. We do not need to specify x or y coordinates as ggplot recognizes that we have passed a object of the class sf and that it contains a column called geometry.
ggplot(data = world) +
geom_sf()We can manipulate the geom_sf layer using the same arguments we have used for other plot types. For example, let’s make all the countries orange.
ggplot(data = world) +
geom_sf(color = "black", fill = "orange") Or, we could color code them according to the column pop_est in the world data.frame which contains population estimates for each country.
ggplot(data = world) +
geom_sf(aes(fill = pop_est)) +
scale_fill_viridis_c(trans = "sqrt") +
coord_sf() Let’s say we want to create a map that compares the number of droughts that occurred in each country from 2010 - 2020. First we would need to transform our raw data.
droughts <- disaster %>%
filter(year >= 2010 & year <= 2020) %>%
group_by(iso) %>%
summarize(droughts_total = n())Now we can use left_join() to join the world and drought data.frames. Note that I used the column iso to calculate the number of droughts, this is equivalent to the column iso_a3 in the world database. These are internationally recognized codes that designate every country and most independent areas with either a two or in this case three-letter abbreviations.
world <- world %>%
left_join(droughts, by = c("iso_a3" = "iso"))
head(world)Simple feature collection with 6 features and 64 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -70.06611 ymin: -18.01973 xmax: 74.89131 ymax: 60.40581
Geodetic CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
scalerank featurecla labelrank sovereignt sov_a3 adm0_dif level
1 3 Admin-0 country 5 Netherlands NL1 1 2
2 1 Admin-0 country 3 Afghanistan AFG 0 2
3 1 Admin-0 country 3 Angola AGO 0 2
4 1 Admin-0 country 6 United Kingdom GB1 1 2
5 1 Admin-0 country 6 Albania ALB 0 2
6 3 Admin-0 country 6 Finland FI1 1 2
type admin adm0_a3 geou_dif geounit gu_a3 su_dif
1 Country Aruba ABW 0 Aruba ABW 0
2 Sovereign country Afghanistan AFG 0 Afghanistan AFG 0
3 Sovereign country Angola AGO 0 Angola AGO 0
4 Dependency Anguilla AIA 0 Anguilla AIA 0
5 Sovereign country Albania ALB 0 Albania ALB 0
6 Country Aland ALD 0 Aland ALD 0
subunit su_a3 brk_diff name name_long brk_a3 brk_name
1 Aruba ABW 0 Aruba Aruba ABW Aruba
2 Afghanistan AFG 0 Afghanistan Afghanistan AFG Afghanistan
3 Angola AGO 0 Angola Angola AGO Angola
4 Anguilla AIA 0 Anguilla Anguilla AIA Anguilla
5 Albania ALB 0 Albania Albania ALB Albania
6 Aland ALD 0 Aland Aland Islands ALD Aland
brk_group abbrev postal formal_en formal_fr note_adm0
1 <NA> Aruba AW Aruba <NA> Neth.
2 <NA> Afg. AF Islamic State of Afghanistan <NA> <NA>
3 <NA> Ang. AO People's Republic of Angola <NA> <NA>
4 <NA> Ang. AI <NA> <NA> U.K.
5 <NA> Alb. AL Republic of Albania <NA> <NA>
6 <NA> Aland AI Åland Islands <NA> Fin.
note_brk name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13
1 <NA> Aruba <NA> 4 2 2 9
2 <NA> Afghanistan <NA> 5 6 8 7
3 <NA> Angola <NA> 3 2 6 1
4 <NA> Anguilla <NA> 6 6 6 3
5 <NA> Albania <NA> 1 4 1 6
6 <NA> Aland <NA> 4 1 4 6
pop_est gdp_md_est pop_year lastcensus gdp_year economy
1 103065 2258.0 NA 2010 NA 6. Developing region
2 28400000 22270.0 NA 1979 NA 7. Least developed region
3 12799293 110300.0 NA 1970 NA 7. Least developed region
4 14436 108.9 NA NA NA 6. Developing region
5 3639453 21810.0 NA 2001 NA 6. Developing region
6 27153 1563.0 NA NA NA 2. Developed region: nonG7
income_grp wikipedia fips_10 iso_a2 iso_a3 iso_n3 un_a3 wb_a2
1 2. High income: nonOECD NA <NA> AW ABW 533 533 AW
2 5. Low income NA <NA> AF AFG 004 004 AF
3 3. Upper middle income NA <NA> AO AGO 024 024 AO
4 3. Upper middle income NA <NA> AI AIA 660 660 <NA>
5 4. Lower middle income NA <NA> AL ALB 008 008 AL
6 1. High income: OECD NA <NA> AX ALA 248 248 <NA>
wb_a3 woe_id adm0_a3_is adm0_a3_us adm0_a3_un adm0_a3_wb continent
1 ABW NA ABW ABW NA NA North America
2 AFG NA AFG AFG NA NA Asia
3 AGO NA AGO AGO NA NA Africa
4 <NA> NA AIA AIA NA NA North America
5 ALB NA ALB ALB NA NA Europe
6 <NA> NA ALA ALD NA NA Europe
region_un subregion region_wb name_len long_len
1 Americas Caribbean Latin America & Caribbean 5 5
2 Asia Southern Asia South Asia 11 11
3 Africa Middle Africa Sub-Saharan Africa 6 6
4 Americas Caribbean Latin America & Caribbean 8 8
5 Europe Southern Europe Europe & Central Asia 7 7
6 Europe Northern Europe Europe & Central Asia 5 13
abbrev_len tiny homepart droughts_total geometry
1 5 4 NA NA MULTIPOLYGON (((-69.89912 1...
2 4 NA 1 65 MULTIPOLYGON (((74.89131 37...
3 4 NA 1 26 MULTIPOLYGON (((14.19082 -5...
4 4 NA NA 1 MULTIPOLYGON (((-63.00122 1...
5 4 NA 1 12 MULTIPOLYGON (((20.06396 42...
6 5 5 NA NA MULTIPOLYGON (((20.61133 60...
Now our world data.frame has a column specifying the number of droughts that occurred.
ggplot(data = world) +
geom_sf(aes(fill = droughts_total)) +
scale_fill_viridis_c(trans = "sqrt") +
coord_sf(expand = FALSE) +
labs(x = "longitude", y = "latitude") +
theme_bw() +
theme(legend.position = "bottom")